-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UpdatePodSandbox to NRI plugin API #141
Conversation
92689e4
to
f7e11c2
Compare
1b158a0
to
33821eb
Compare
cc @MikeZappa87 This may have some interesting properties for networking plugins |
Now that pods can be updated, should we extend |
33821eb
to
5688820
Compare
Good question/point. And if we do that, we should also consider adding an (unsolicited) UpdatePodSandboxes next to UpdateContainers in the protocol. |
5688820
to
7121d70
Compare
Can the container runtime make changes to the Pod sandbox's resources? I thought that was managed (at least for now) by kubelet. |
I am pretty certain if the PodSandboxStatus RPC returns additional labels, annotations that the kubelet doesn't know about a test will fail. I believe I broke that last year :-) See the comment at https://github.com/kubernetes/cri-api/blob/82e1e9174de30efdb4f82fbb5251c31e4a7ce34a/pkg/apis/runtime/v1/api.proto#L589 |
@chrishenzie Not at the moment... And actually considering that pod-level cgroup management is in the kubelet and not in the runtime, I think we try to do this now wrong here. At the NRI level this needs to be an event-type message (no possibility for the plugin to request changes to the pod in response). And if this is true, (maybe) it would be better to add it as a new Event enumeration and pass to the stub/plugin on the wire as a StateChange event, just like all the other similar ones (or then maybe it wouldn't... see later comment related to StateChange). |
pkg/adaptation/adaptation.go
Outdated
@@ -208,6 +209,17 @@ func (r *Adaptation) RunPodSandbox(ctx context.Context, evt *StateChangeEvent) e | |||
return r.StateChange(ctx, evt) | |||
} | |||
|
|||
// UpdatePodSandbox relays the corresponding CRI request to plugins. | |||
func (r *Adaptation) UpdatePodSandbox(_ context.Context, _ *UpdatePodSandboxRequest) (*UpdatePodSandboxResponse, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking more about this, I think this should be an event: unlike for containers, pod cgroup management is in the kubelet. Therefore there is not much point in allowing a plugin to respond with requested changes to the update. There is nothing the runtime could do to make that happen.
Currently all other similar events are passed on the wire using a StateChange with an Event enumeration indicating which event is being passed to the stub/plugin. If we go that route, this would then become just a new enumeration, similar to how PostUpdatePodSandbox is now added. However, if we want to pass to the plugin both the old and the updated pod resources, that's not possible with a StateChange message in its current form (we'd need to add a new field there).
I'd anyway like to update the protocol and unfold all StateChange represented events to separate RPC calls with call-specific messages (here's a WiP branch doing that) before we consider tagging a 1.0 NRI version. Now with that in mind, maybe the best way to go forward with this is to not try to shove it into a new StateChange delivered event but instead have it already as a separate RPC call (like it is now)... Thoughts on that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a likelihood that pod-level cgroup management could shift to the container runtime in the future? If so I think it would be more intuitive for this to not be just an event but to also have its own response type (which can be empty for now per your other feedback).
It looks like this method signature would fit in with the approach in your branch so I'd be inclined to not make any changes here, but I'd defer to your preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. I think we should stick to the normal RPC signatures on the wire and just have the returned UpdatePodSandboxResponse struct be empty. And in the stub/plugin interface (the handlers) it is probably better to only have a returned error in the signature of the handler, just like you have it there now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a likelihood that pod-level cgroup management could shift to the container runtime in the future? If so I think it would be more intuitive for this to not be just an event but to also have its own response type (which can be empty for now per your other feedback).
I know some folks who'd like to at least give some serious consideration for that to happen in the long run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubelet is only one CRI user.. albeit the primary case.
I like the idea that certain pod meta fields should be managed only by the owning client, and we can in some cases also have overlay rights to fields owned by physical devices managed by the kernel/said device services. IOW if the container runtime and/or plugin is configured as a manager of a particular device.. then we should keep an unmolested copy of the owning client's pod meta, and a copy of device plugin edits that would otherwise have to be done by a runc proxy. This way we can have kubelet's pod meta, the device plugin pod meta.. and same for containers. We would normally do such overrides in the container runtime and / or runc via container runtime config / runc oci plugins..
@chrishenzie @askervin I think the answer to both of these questions is probably 'no, we should not'. Unlike for containers, pod level cgroup management is done in the kubelet. There is nothing the runtime can do if a plugin requested changes, either unsolicited or in response to an UpdatePodSandbox. So we should not pretend otherwise at the protocol level either. |
We were hoping to get the Pod Labels/Annotations added to the request message. The reason behind this is that if the label is updated on a K8s pod, the container runtime is never updated with this state thus etcd becomes the source of truth. I think adding this will help the NRI ecosystem move service mesh and possibly other network plugins over to NRI. Links to the fields. |
If we have labels ( and maybe also annotations) information on the PodSandboxXXXXX hooks, then we remove all the need for watching the apiserver on network integrations , like cni plugins and service meshes, it will be a really nice addition that solves a long standing problem, the runtime is the source of truth for the Pod lifecycle on the Node in these hooks, components having to go to the apiserver are always going to race, since the state of the apiserver is the desired state that does not necessarily matches the current state on the runtime (this also solve problems when node get disconnected temporary, or have issues to connect to the control plane) |
@chrishenzie @askervin Looking at my comments from yesterday it's probably warranted to try and give a bit more coherent summary which does not look like a diary entry from an H.P. Lovecraft story:
|
Looking at the CRI protocol definition changes in the KEP, it looks to me as if those
This seems to imply to me different semantics than updating the the pods annotations. Rather optionally annotating (how to apply) the updates instead. Looking at the runtime implementations for If we'd prefer not to carry over (maybe even only semi-accidentally) a relic to the new |
409373f
to
21fec05
Compare
@klihub agree, annotations are good for experimentation but then become an ugly API and extension point creating tightly coupling between the implementations ... we need to think about something better, I will check with the DRA folks, since this is about resources we may model this problem as a DRA problem that has better semantics to support what we want |
@aojea Well, I meant something much simpler with my comment (and I think being able to propagate pod annotation/label updates to the runtime could be useful). All I wanted to say is that if we want to use the annotations in that message to update pod annotations, then we should bring this up in the KEP discussion. And also change the related comment in the protocol description, because it looks like something which was simply copied from the existing |
Following along the discussion of annotations, I think we should probably just drop the annotations field from the
I don't think tacking these onto the |
@chrishenzie I think we'd still need to get rid of the usage of StateChange on the adaptation side implementation of UpdatePodSandbox. Maybe, something like this: klihub@7b1b3e0 |
Annotations is a good pattern to explore the thought that we should have a set of client owner annotations reported through the cri apis to the CRI client and another set of annotations that overlay (add) to the set managed under the client at the device layer. We could report the additional annotations to the client in a new field or in the info map. |
Nod... IMO any need to keep "extra" annotations, and/or ENVs, between the plugins and devices should kept in the container runtimes pod/container store as extras and either not reported to the client or if reported then as additional fields and/or placed temporarily in info maps.. Trying to avoid said overrides/additions being unmanaged through oci hooks. |
This API is called by the container runtime to update pod sandbox resources when they change. This is for the In-place Pod Resize KEP-1287. Signed-off-by: Chris Henzie <[email protected]>
21fec05
to
e4ce8c1
Compare
We should keep in mind that K8s and the container runtime may have different states for the labels as you can apply a label to a running pod and the container runtime is never updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This API is called by the container runtime to update pod sandbox resources when they change. This is for the In-place Pod Resize KEP-1287.
@tallclair
@samuelkarp